Back

Genetics in Medicine

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Genetics in Medicine's content profile, based on 69 papers previously published here. The average preprint has a 0.09% match score for this journal, so anything above that is already an above-average fit.

1
Prevalence and Clinical Significance of Adult-Onset Cancer Predisposition Variants in Pediatric Oncology

Maciaszek, J. L.; Pastor Loyola, V.; Cain, T.; Cardenas, M.; Blackburn, P. R.; Wilkinson, M. R.; Koo, S. C.; Wu, C.-H.; Li, C.; Wang, L.; Nichols, K. E.; Klco, J. M.; Eldomery, M. K.

2026-06-08 genetic and genomic medicine 10.64898/2026.06.07.26354365 medRxiv
Top 0.1%
38.0%
Show abstract

Purpose: Pathogenic or likely pathogenic (P/LP) variants are increasingly identified in genes more commonly associated with adult-onset cancer predisposition, but their prevalence and relevance to children who present with cancer remain unclear. Methods: We retrospectively analyzed 1,280 consecutive pediatric patients with cancer who underwent clinical germline sequencing, using a virtual panel, from 2021 to 2024. Genes with P/LP variants were categorized as aoCPG or pediatric-onset cancer predisposition genes (poCPG) according to cancer risk before age 18 years and pediatric surveillance recommendations. Variant relevance was adjudicated using tumor diagnosis/histopathology, immunohistochemistry, and tumor molecular features and classified as primary, secondary, or indeterminate. Results: Among 1,280 patients, 197 (15.4%) harbored 211 P/LP variants across 54 genes. Sixty-six variants (31.3%) occurred in aoCPG, 87 (41.2%) in poCPG, and 58 (27.5%) were heterozygous variants in autosomal recessive genes. Among adult-onset variants, 7 (10.6%) were primary, 54 (81.8%) secondary, and 5 (7.6%) indeterminate. Among pediatric-onset variants, 77 (88.5%) were primary and 10 (11.5%) secondary. Six patients (3 adult-onset variants; 3 pediatric-onset variants) received targeted therapy informed by germline/somatic sequencing results. Conclusion: In pediatric oncology, most variants in aoCPG are secondary rather than tumor-related findings. Tumor-informed interpretation, beyond variant classification, may improve reporting, counseling, and therapeutic decision-making

2
Rare neurological and neurodevelopmental variants in ALS link to onset, survival and family history

O'Donoghue, C.; Kacar, E.; Gomes, T.; Costello, E.; Pender, N.; Peelo, C.; Ryan, M.; Heverin, M.; Byrne, S.; Bede, P.; Hardiman, O.; McLaughlin, R. L.; Byrne, R. P.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26354977 medRxiv
Top 0.2%
14.8%
Show abstract

Background: Neurological, neuropsychiatric, and neurodevelopmental disorders cluster in ALS families, sharing a common genetic architecture with ALS. Pathogenic variants in genes associated with other neurological, neurodevelopmental, or neuropsychiatric disorders may also co-occur in ALS and modify phenotype. We have sought to determine the prevalence and clinical pattern of likely-pathogenic/pathogenic (LP/P) non-ALS neurological, neurodevelopmental, and neuropsychiatric variants, alone and in combination with ALS-gene variants, in two large ALS cohorts. Methods: Whole-genome sequencing (WGS) of 469 Irish and 774 Answer ALS people with ALS (pwALS) was analysed for ClinVar LP/P variants associated with other neurological (n = 15541), neurodevelopmental (n = 9761), and neuropsychiatric (n = 321) phenotypes. Inheritance patterns for associated genes (autosomal recessive/autosomal dominant) along with the associated phenotype were validated using OMIM. Standardised clinical data included family history, site and age of onset, El Escorial category, survival, motor decline, and cognitive and behavioural assessments. Known ALS-gene variants and C9orf72 repeat expansion status were included for each cohort. Results: Non-ALS neurological variants were identified in 47/469 (10.0%) Irish and 69/774 (8.9%) Answer ALS participants, most frequently in hereditary spastic paraplegia-associated genes (3.2% Irish; 2.8% Answer ALS). Irish neurological variant carriers showed higher frequency of respiratory onset (10.6% vs 1.2%, Fisher's exact p = 0.002, {Phi} = 0.20) and fewer premorbid behavioural symptoms (0.92 +/- 0.56 vs 3.08 +/- 0.97, Cohen's d = -0.40). Neurodevelopmental variants occurred in 12/469 (2.6%) Irish and 20/774 (2.6%) Answer ALS participants. In the Irish cohort, neurodevelopmental variant carriers had significantly shorter survival in Cox proportional hazards model (log-rank p = 0.005), corresponding to a more than two-fold increased hazard of death (HR = 2.25, 95% CI 1.26-4.00), and had significantly increased familial burden of neuropsychiatric disorders among first- and second-degree relatives (negative binomial IRR for carriers = 2.41, 95% CI: 1.12-5.18, p = 0.025). Across combined cohorts, 18 individuals (Irish n = 8; Answer ALS n = 10) carried [≥]2 LP/P variants spanning ALS and non-ALS genes. Conclusion: Rare LP/P variants in genes associated with other neurological and neurodevelopmental disorders occur in up to 12% of pwALS across two independent cohorts. Carriers show distinct phenotypes, shorter survival, and characteristic family history patterns. These findings suggest that extended pleiotropic and oligogenic architectures may contribute to ALS heterogeneity.

3
Whole-exome-based preconception carrier screening in Uzbekistan with targeted SMA, FMR1, and DMD assays: the first reported clinical program

Kullyev, A.; Avdeichik, S.; Akimenkova, A.; Kartuesov, A.; Kardymon, O.; Goikhman, Y.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.02.26354713 medRxiv
Top 0.2%
10.6%
Show abstract

Abstract Purpose: Published clinical outcome data on preconception carrier screening (PCS) in Central Asia are limited. We report the first clinical implementation study from Uzbekistan of a whole-exome sequencing (WES)-based multi-platform PCS program combining exome sequencing with targeted SMA, FMR1, and DMD assays. Methods: We retrospectively analyzed anonymized data from 65 individuals (19 couples, 27 singletons) screened at IMC Genomics, Tashkent, between January 2024 and May 2026. WES covering the protein-coding regions of approximately 20,000 genes was followed by exome-wide bioinformatics filtering and clinical geneticist interpretation. Partly overlapping cohorts underwent SMA carrier screening (n=179), FMR1 CGG-repeat analysis in females (n=155), and DMD deletion/duplication testing in preconception females (n=29). Variants were classified by ACMG/AMP criteria against gnomAD v4.1. Results: Sixty-one of 65 WES-screened individuals (93.8%; 95% CI 85.2 - 97.6%) carried at least one reportable variant (152 instances across 126 genes). Four of 19 couples (21.1%; 95% CI 8.5 - 43.3%) were concordant for pathogenic or likely pathogenic variants in the same autosomal recessive gene; two were referred for preimplantation genetic testing for monogenic disease. SMA screening identified four carriers, including two 2+0 silent carriers; FMR1 analysis identified one intermediate allele; DMD MLPA identified no exonic rearrangements. Conclusion: This first reported WES-based multi-platform PCS program in Uzbekistan was feasible and clinically informative, identifying actionable couple-level reproductive risks and supporting structured implementation of reproductive genetic screening in Central Asia.

4
Breast cancer polygenic risk score performance varies by socioeconomic status

Domian, H. I.; Tian, X.; Ong, D.; Hamilton, L.; Shieh, Y.; Musharoff, S. A.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354819 medRxiv
Top 0.2%
10.1%
Show abstract

Background: Polygenic risk scores (PRS) for breast cancer are increasingly used for risk stratification to inform screening and prevention. However, for PRSs to be equitable and clinically useful, they need to perform well across diverse populations. While PRS performance is known to be ancestry-dependent, it is not well understood how environmental context, such as that of socioeconomic status (SES), affects PRS transferability. Here, we assess whether SES, measured via self-reported household income, modifies breast cancer PRS performance and, if so, whether socioeconomic context contributes predictive information beyond genetic risk alone. Methods: We used the US-based All of Us biobank to evaluate how SES impacts breast cancer PRS performance. First, we quantified changes in breast cancer PRS performance by modeling a commonly-cited polygenic score for breast cancer previously described by Mavaddat et al. with SES. We then reestimated the genetic effect sizes of the 3,820 variants from Mavaddat et al. in All of Us with and without income as a covariate. Because social determinants of health affect breast cancer detection and outcomes, we stratified analyses by socially defined populations on the basis of self-identified race and ethnicity. We further stratified individuals whose self-identified race is White (''White'') into three SES groups (high, middle, low) based on self-reported income and re-estimated genetic effect sizes to create SES-specific PRSs. We then applied these PRSs to White participants, the largest group in the study, and to Black or African American (''Black'') and Hispanic or Latino (''Hispanic'') participants, groups underrepresented in breast cancer research. Model discrimination between cases and controls was measured by area under the curve (AUC). Results: We analyzed 163,715 women from the All of Us biobank, which included 8,833 breast cancer cases (6,619 White, 1,178 Black, and 1,036 Hispanic), with relative income available for a subset of these cases (5,525 White, 848 Black, and 566 Hispanic). The ancestry-dependent performance of the breast cancer PRS described in Mavaddat et al. was replicated in All of Us. In Black individuals, this PRS (AUC and 95% CI: 0.576 [0.571, 0.582]) produced a similar increase in AUC as relative income (AUC: 0.573 [0.568, 0.577]) when added to an age-only model. Incorporating income with PRS, age, and genetic PCs 1-3 improved AUC by 0.007 in White Americans and 0.018 in Black Americans (both p < 10-11), while attenuating the contribution of PRS in the full model. PRS performance also varied among SES categories. Notably, PRSs with variant effect sizes that were recalibrated in low-SES White participants performed best in low-SES White participants (AUC: 0.605 [0.583, 0.628]) and Black Americans (AUC: 0.588 [0.586, 0.591]), both better than performance in high-SES White Americans (AUC: 0.579 [0.577, 0.580]) and middle-SES White Americans (AUC: 0.578 [0.569, 0.586]). Conclusion: Socioeconomic context, measured by income, significantly impacts the transferability of a PRS for breast cancer within and among groups defined by self-identified race and ethnicity. Accounting for SES improves PRS performance, most notably in Black Americans and low-SES White individuals.

5
Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv
Top 0.3%
7.0%
Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

6
Metatranscriptomics-Derived Disease Risk Scores as a Preventive, Diagnostic, and Treatment Support Tool

Hu, L.; Bass, M.; Patridge, E.; Molusky, M.; Antoine, G.; Vuyisich, M.; Banavar, G.

2026-06-06 genetic and genomic medicine 10.64898/2026.05.29.26354333 medRxiv
Top 0.3%
6.6%
Show abstract

Background: Chronic diseases and symptom syndromes often develop after prolonged biological changes that may precede formal diagnosis. RNA-based metatranscriptomics captures active microbial and human gene expression and may provide a functional layer for disease risk evaluation. To address this translational gap, we developed and validated a Disease Risk Score (DRS) framework that integrates metatranscriptome-derived pathway activity scores from stool, saliva, and blood samples, and evaluated its potential clinical utility as an adjunct risk-evaluation tool. Methods: DRS uses disease-specific sets of pathway activity scores derived from stool and saliva microbial functions, stool and saliva microbial taxa, and blood human gene expression. For each disease, 'not optimal' pathway scores are aggregated into a normalized cumulative odds ratio, or cOR, using score-level odds ratios, statistical significance, and literature-supported biological relevance derived from a Development Cohort of 22,369 individuals. A cOR [&ge;] 5 is defined as high risk. Performance is evaluated in an independent Validation Cohort of 15,908 individuals using self-reported diseases as the reference. Disease support requires both significant cOR separation between self-reported and not-reported (Cohen's d [&ge;] 0.2) and risk ratio enrichment of self-reported disease among individuals classified as high risk (95% CI of Risk Ratio > 1). Results: Of 20 initially evaluated diseases, 15 meet the prespecified validation criteria on the independent validation cohort: ADHD, anxiety, chronic fatigue syndrome, depression, GERD, hypertension, inflammatory bowel disease, IBS-C, IBS-D, insomnia, MASLD, obesity, obstructive sleep apnea, Sjogren's syndrome, and type 2 diabetes. Five selected clinical scenarios illustrate how DRS can support clinician-mediated decision making, including IBS subtype reclassification, improved diagnostic acceptance in IBS-D, personalized lifestyle counseling in MASLD and early type 2 diabetes, and diagnostic uncertainty in atypical GERD. Conclusions: DRS is a metatranscriptomics-based risk-stratification framework that aggregates active microbial and human pathway signals into interpretable disease-specific risk estimates across a wide range of disease conditions. Validation against self-reported disease labels in an independent cohort shows significant risk enrichment for each of 15 diseases. DRS is intended as an adjunct to clinical evaluation: a decision support tool in situations where routine care encounters uncertainty, delay, or low patient engagement. Future prospective studies using clinically adjudicated endpoints are needed to assess calibration and clinical outcomes.

7
Contextualizing the Utility of Polygenic Risk Scores using Absolute Risk Models in Diverse Ancestry Populations

Chatterjee, N.; Martina, F.; Kachuri, L.; Natarajan, P.; Witte, J.; Huo, D.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354842 medRxiv
Top 0.3%
6.2%
Show abstract

Polygenic risk scores (PRSs) are emerging as powerful tools for quantifying inherited risk for common diseases and, in some cases, are approaching clinical implementation. A major concern for PRS implementation is their limited accuracy in non-European populations, particularly in those of African ancestry. However, past evaluations have focused on metrics such as relative risk or AUC, which do not capture background risk arising from contextual factors. We introduce a novel measure of variable importance, the conditional average derivative estimator (CADE), to evaluate PRS utility across diverse contexts and populations within absolute risk models that integrate PRSs with other relevant risk factors. We illustrate this framework by integrating PRSs for breast and prostate cancer within age-specific absolute risk models for incidence and mortality fit using individual-level data from the All of Us Research Program with inputs from the National Cancer Institute SEER cancer registry. Our projections show that although the PRSs are known to have the lowest discriminatory accuracy in African Americans (AA), there are contexts in which they provide greater utility, such as for the stratification of prostate cancer risk and mortality, where the CADE values for AA were 2- and 7-fold higher than for European Americans. These findings suggest that conclusions about the limited clinical utility of PRS in non-European populations may be premature and underscore the need to quantify PRS risk-stratification utility at the absolute-risk level, while accounting for disease onset, survival, and broader health and economic factors.

8
Incremental Clinical Value of Single-Molecule Nanopore Sequencing in Thalassemia Testing: A Prospective Double-blind, Multicenter Study

Xiang, J.; Zhu, B.; Xu, H.; Chen, Y.; Sun, X.; xiang, r.; Zhao, Y.; Liu, W.; Zhang, L.; He, J.; liu, j.; Chen, Y.; Fan, Z.; Zhang, H.; Tan, J.; Pang, L.; Shi, L.; Kong, Y.; Cai, A.

2026-06-09 hematology 10.64898/2026.06.09.26354559 medRxiv
Top 0.4%
4.4%
Show abstract

Background Thalassemia is one of the most common monogenic disorders worldwide, current screening strategies combining hematological testing with molecular assays still carry a risk of missed diagnoses and undesirable efficiency, particularly for complex structural variants and rare mutations. Methods In this prospective double-blind, multicenter cohort study of 3,842 participants (3,362 pregnant women and 480 male partners), we conducted a head-to-head comparison to systematically evaluate the incremental clinical value and detection performance of single-molecule nanopore sequencing in thalassemia (SMITH) against conventional hematological testing and next-generation sequencing (NGS). Findings The overall concordance rate between NGS and SMITH was 98.6% (3789/3842). The discrepant cases (n=53) were directly attributed to the superior detection capabilities of SMITH, which successfully identified complex structural rearrangements-including 45 -globin gene triplications and four HK alleles-that were missed by NGS. Furthermore, SMITH accurately detected four rare variants (c.134_135insT/, c.-22(C>T)/, {beta}N/{beta}c.316-290delinsAGGGCAATAATTT and {beta}3.5 kb deletion/{beta}N ) and resolved ten trans and three cis configurations within the globin gene allele. Clinically, these technical advantages translated to a 9.3% (5/54) increase in the detection rate of high-risk prenatal couples, effectively preventing one birth affected by moderate-to-severe thalassemia. Additionally, SMITH corrected a diagnostic discrepancy in one case (HK vs. -3.7), sparing the couple from an unnecessary invasive procedure. Interpretation Our findings demonstrate that SMITH provides a powerful platform for resolving globin gene rearrangements, detecting rare variants, and enabling direct haplotype phasing. By effectively eliminating diagnostic blind spots, SMITH is expected to become an optimal method for thalassemia prevention programs. Funding This study was supported by Chinese National Natural Science Foundation Projects 81760037 and 82271894.

9
Heterozygous MMACHC burden variants are associated with higher circulating vitamin B12 in the All of Us Research Program

Cai, L.; DeBerardinis, R. J.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354855 medRxiv
Top 0.4%
3.7%
Show abstract

Heterozygous carriers of autosomal recessive disease variants are conventionally considered unaffected, yet population-scale genomic datasets reveal subclinical carrier phenotypes. MMACHC encodes a cobalamin-processing protein whose biallelic loss causes cobalamin C deficiency, an inborn error of intracellular cobalamin metabolism. We performed an unbiased quantitative phenome-wide association screen in All of Us Research Program v8 to identify phenotypes associated with rare heterozygous MMACHC burden variants. Serum/plasma vitamin B12 was the top quantitative association. Carriers had higher circulating B12 than non-carriers in adjusted analyses, but also higher homocysteine, suggesting that elevated circulating B12 does not reflect improved intracellular cobalamin function. Carriers were less likely to fall below conventional B12 insufficiency thresholds, indicating a potential diagnostic blind spot. A pathway-wide rare-variant gene-burden (All-by-All) gene-burden analysis placed this finding in broader biological context. Burdens in genes related to circulating B12 binding or intestinal absorption were associated with lower circulating B12. In contrast, burdens in several genes involved in cellular delivery and intracellular cobalamin handling were associated with higher circulating B12. This step-specific directionality supports a model in which elevated circulating B12 can reflect impaired cellular handling and consequent systemic accumulation rather than improved cellular cobalamin availability. Because EHR-derived B12 is shaped by heterogeneous clinical and medication contexts, prospective carrier-enriched studies with standardized methylmalonic acid, homocysteine, diet, supplement, medication, comorbidity, and symptom ascertainment are needed to evaluate functional-marker-based screening.

10
More Than Results: A Qualitative Study on the Role of Person-Centered Genetic Counseling in Parkinson Disease Research

Verbrugge, J.; Fiallos, K.; Cook, L.; Miller, M.; Head, K. J.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.03.26354465 medRxiv
Top 0.5%
3.1%
Show abstract

As genetic testing becomes increasingly integrated into Parkinson disease (PD) research, including targeted testing for variants in LRRK2 and GBA1, the return of individual research results is becoming more common. However, limited qualitative data exists regarding how research participants experience genetic results disclosure and post-test genetic counseling in PD research settings. We conducted semi-structured qualitative interviews with participants (n=13) enrolled in the Parkinson Precision Medicine Initiative (formerly Parkinson Progression Markers Initiative; PPMI) who had received PD-related genetic test results and post-test genetic counseling. Interviews were conducted 1 to 3 weeks following result disclosure and analyzed using thematic analysis with a primarily deductive coding approach informed by study aims and inductive identification of emergent themes. Four primary themes were identified: (1) personal connection and motivations for participation, (2) centrality of result disclosure and information preferences, (3) emotional experiences and support needs, and (4) communication quality and alignment with participant needs. Overall, our findings underscore the importance of person-centered genetic counseling within PD research. As return of genetic and biomarker results in research and clinical trial contexts expand, thoughtful integration of relational, informational, and communication-focused practices will be essential to support participant engagement and trust.

11
Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models

Islam, T.; Danner, M.; Ziad, Z.; Begemann, M.; Beijer, D.; Lischka, A.; Lausberg, E.; Mattern, L.; Suh, J.; Wittig, P.; Guezel, N.; Schlaich, E.; Karaivanova, R.; D'Augello, S.; Franken, L.; Ruedebusch, J.; Mueller, R.; Perchalla, E.; Zempel, H.; Haag, N.; Eggermann, K.; Eggermann, T.; Meyer, R.; Kraft, F.; Elbracht, M.; Kurth, I.; Krause, J.

2026-06-05 health informatics 10.64898/2026.06.04.26354845 medRxiv
Top 0.5%
2.4%
Show abstract

Background: Molecular medicine has made genetic diagnostics crucial for rare diseases, but the majority of patients remains without diagnosis even after state-of-the-art assessment. Standardized systems for integrating clinical features, such as the Human Phenotype Ontology (HPO), offer assistance, but are often insufficiently detailed and fail to capture crucial clinical parameters such as age at onset, longitudinal changes in symptoms, detailed characteristics of a clinical symptom, or the absence of a feature. Results: We present Genosolver an integrated workflow that utilizes machine learning to address this bottleneck. Using Large Language Models (LLMs) and Large Reasoning Models (LRMs) on unstructured clinical notes and electronic health care data, we generate a workflow that unifies phenotype extraction, generates differential diagnosis, and prioritizes genetic variants from genome data. We evaluated the performance on 233 previously genetically solved cases, where Genosolver ranked the causative gene first in 72% of cases and in 94% of cases in the top 10 gene list, outperforming the existing benchmarking tool Exomiser by 9%. Semi-automated reanalysis of 1,875 unsolved rare disease cases yielded an additional diagnostic rate of 1.7%. Incorporating rich, unstandardized clinical narratives substantially enhanced model performance beyond HPO-only inputs and demonstrated competitive results using data security compliant local models. Conclusion: Integrating unstandardized clinical data with local LLMs and reasoning offers a scalable, data-secure workflow that increases molecular diagnoses in rare diseases.

12
Human genetic evidence links serine biosynthesis to diabetic peripheral neuropathy

Fridman, V.; Kakar, A.; Jensen, A.; Van de Vondel, L.; Wheeler, A.; Phillips, L. S.; Zhou, J.; Zuchner, S.; Reusch, J.; Raghavan, S.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26355286 medRxiv
Top 0.5%
2.4%
Show abstract

Diabetic peripheral neuropathy (DPN) is a common and disabling condition for which no disease-modifying therapies are available. Glycemic and metabolic drivers do not fully explain why only a subset of individuals with diabetes develop DPN, and genetic contributors remain poorly defined. We aimed to perform a multi-population genome-wide association study (GWAS) of DPN to highlight potential new etiological pathways and therapeutic targets. Methods We performed a multi-population GWAS of neuropathy in people with and without diabetes using the VA Million Veteran Program and UK Biobank, followed by replication in the All of Us Research Program (AoU), and gene-based and gene-set analyses to identify implicated pathways. Causal relationships between circulating serine levels and DPN were further tested using two sample Mendelian randomization. To further evaluate pathogenic potential, we analyzed rare, high impact variants in GWAS implicated genes among individuals with unresolved inherited neuropathies using the GENESIS platform. Findings Among individuals with type 2 diabetes, we identified seven genome wide significant loci (p<5x10-): PHGDH and PSPH (key serine synthesis genes), TEAD1, CYP4F11, LARGE1, FTO, and COBLL1. No loci were significant in individuals without diabetes or with type 1 diabetes. Four loci (PHGDH, TEAD1, FTO and CYP4F11) replicated in AoU (p <0.05). Mendelian randomization demonstrated that higher genetically predicted serine levels were associated with lower DPN risk, consistent with a causal role of serine metabolism in disease pathogenesis. Rare-variant burden analyses revealed associations of predicted deleterious variants with inherited neuropathy case status in PHGDH (odds ratio [OR] 12.7 [95% CI 7.9, 20.4]), PSPH (OR 8.5 [7.2, 10.2]), PHKG1 (OR 4.8 [3.7, 6.3]), and LARGE1 (OR 0.007 [0.0004, 0.1]). Interpretation Convergent genetic evidence across common and rare variation implicates serine synthesis as a key pathway in DPN. These findings link diabetic and inherited neuropathies through a shared metabolic mechanism, identifying serine metabolism as a potential therapeutic target.

13
Natural History of Prenatally Identified Children with 48,XXYY Syndrome in Infancy and Early Childhood

Nocon, K.; Swenson, K.; Bothwell, S.; Howell, S.; Davis, S.; Ikomi, C.; Ross, J.; Tartaglia, N.

2026-06-04 pediatrics 10.64898/2026.06.04.26353909 medRxiv
Top 0.6%
2.0%
Show abstract

Background: 48,XXYY syndrome is a rare sex chromosome aneuploidy (SCA) characterized by neurodevelopmental deficits and medical comorbidities. The limited information available in the literature is almost exclusively limited to postnatally diagnosed cases. This study aims to describe the early medical and developmental features of prenatally identified 48,XXYY infants, with comparisons to 47,XYY, 47,XXY cohorts, and typical populations, as well as previously reported postnatally diagnosed 48,XXYY cases. Methods: The eXtraordinarY Babies Study prospectively follows children prenatally identified to be at high risk for SCA with annual medical and neurodevelopmental evaluations. Data presented herein include the prevalence of medical conditions, developmental milestones, developmental and adaptive functioning assessment scores, and therapy utilization in participants confirmed to have 48,XXYY. Comparisons were made between this cohort and the typical population, infants with 47,XYY and 47,XXY also enrolled in the eXtraordinarY Babies Study, and a 2008 cohort of individuals postnatally identified 48,XXYY. Results: Infants with 48,XXYY exhibited a range of early medical features, including high rates of feeding and GI disorders (breastfeeding difficulties, gastroesophageal reflux, and eosinophilic esophagitis), allergic disorders (food allergies and environmental allergies), and hypotonia. Developmental and adaptive functioning scores indicated delays in motor, communication, and social domains, with nearly all infants receiving speech therapy, physical and/or occupational therapy. Comparisons with the 47,XYY and 47,XXY cohorts revealed more medical and developmental challenges in the 48,XXYY group, however there was variability and some overlap with both the general population and sex chromosome trisomy conditions. Additionally, comparison to the 2008 postnatally identified 48,XXYY cohort indicated that while prenatal diagnosis allowed for earlier intervention, developmental outcomes in the first years of life were similar between the two groups. Conclusions: 48,XXYY diagnosed prenatally facilitates early monitoring, anticipatory guidance, and proactive referrals for medical evaluations and intervention, given developmental delays and medical challenges are more common in infancy and early childhood compared to the general population and trisomy SCAs. These findings provide valuable insights for genetic counselors and healthcare providers, emphasizing the spectrum of medical and developmental findings and importance of early and proactive care to support individual outcomes. Prospective study of this prenatally identified cohort will provide important natural history and phenotypic variability in XXYY, as well as identification of predictors of health and developmental outcomes.

14
Three-Month Observational Data for the MPS IIIB Sentinel Subject Following AAV9 Mediated Gene Therapy

Ma, X.; Gu, R.; Ma, W.; Xu, Q.; Wang, R.; Wang, W.; Liang, M.; Liu, X.; Yang, X.; Zhuang, L.; Zhang, W.; Zeng, X.; Xu, J.; Xu, X.; Wu, Z.; Xia, Y.; Liu, Y.; Zhou, J.; Zhu, X.; Wang, H.; Dong, Z.; Yang, W.; Dai, Y.; Pan, X.; Li, X.; Wang, Y.; Dong, X.; Wu, X.; Feng, Z.

2026-06-09 neurology 10.64898/2026.06.01.26354386 medRxiv
Top 0.6%
1.8%
Show abstract

Background: Mucopolysaccharidosis type IIIB (MPS IIIB) is a devastating neurodegenerative lysosomal storage disorder caused by alpha-N-acetylglucosaminidase (NAGLU) deficiency. There is currently no approved therapy. We report the 3-month outcomes of a novel intracerebroventricular (ICV) gene therapy in a child with MPS IIIB. Methods: In an open-label, single-center, investigator-initiated trial (ChiCTR2600121466), a single dose of RDGT-101 (2.0E14; vg of an AAV9 vector encoding human NAGLU) was administered via ICV infusion. Primary outcomes were safety and tolerability. Secondary outcomes included serum NAGLU activity, urinary heparan sulfate (HS) excretion, and neurocognitive function. Exploratory analyses included hematological parameters. Results: The patient achieved serum NAGLU activity (17.06 nmol/mL/hour) approaching that of healthy controls (17.75 {+/-} 1.37 nmol/mL/hour) by Month 3, accompanied by a 58.4% reduction in urinary HS. Clinically, previously severe hand and toe contractures resolved, allowing for full extension. Neurocognitive improvements were observed, including clear articulation, logical conversation, and sustained eye contact. Hematological analyses revealed normalized red blood cell indices and improved iron utilization. No dose-limiting toxicities, serious adverse events, or clinically significant laboratory abnormalities were observed. Conclusions: A single ICV infusion of RDGT-101 was safe and well-tolerated in this patient with MPS IIIB. Early biochemical correction was accompanied by marked improvements in somatic, neurocognitive, and hematological parameters. These findings support further investigation of ICV AAV9 gene therapy for MPS IIIB.

15
Multi-ancestry analysis of POLG variants in Parkinson's disease

Tay, Y. W.; Elsayed, I.; Yeow, D.; James, M.; Kung, P.-J.; Screven, L.; Dilliott, A. A.; Alcalay, R. N.; Fang, Z.-H.; Tan, A. H.; Global Parkinson's Genetics Program (GP2), ; Sue, C. M.; Lange, L. M.; Perinan, M. T.

2026-06-08 genetic and genomic medicine 10.64898/2026.06.07.26354811 medRxiv
Top 0.7%
1.4%
Show abstract

Introduction: Variants in the polymerase gamma (POLG) gene are associated with a wide range of mitochondrial disorders. Emerging evidence suggests a potential link between POLG variants and Parkinson's disease (PD); yet, results remain inconclusive. Objectives: To investigate the genetic spectrum and prevalence of POLG variants in PD across diverse ancestries. Methods: We leveraged multi-ancestry genetic data from the Global Parkinson's Genetics Program (GP2), including genotyping data from 98,589 and short-read sequencing data from 36,022 individuals. We performed a POLG rare variant screen, case-control association, and gene-level burden analyses. Results: Five PD cases carried potentially biallelic rare pathogenic/likely pathogenic POLG variants. Additionally, 228 individuals (<1%; 161 PD cases, 28 individuals with other neurological disorders, and 39 controls) carried 34 distinct rare pathogenic/likely pathogenic heterozygous variants, with no significant frequency differences between cases and controls, except for the p.Ala467Thr variant in the European population. The co-inherited pathogenic variants p.Thr251Ile and p.Pro587Leu were present in <1% of both cases and controls, with no significant group differences. Burden and variant-level association analyses showed no association between rare POLG variant burden or common POLG variant enrichment and PD. Conclusions: POLG variants are overall rare in PD. The identification of rare pathogenic variants among PD cases suggests that POLG-related mitochondrial dysfunction may contribute to PD in isolated instances, particularly under recessive inheritance. Our findings support a role for POLG variants in select cases and underscore the need for larger-scale sequencing and functional studies.

16
A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics 10.64898/2026.06.05.26354934 medRxiv
Top 0.9%
0.8%
Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [&ge;] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

17
Closing the Paediatric Gap: Adult-Trained AI Generalises Robustly to Paediatric Coeliac Disease Diagnosis

Jaeckle, F.; Gillett, P. M.; Kirkwood, K. J.; Natu, S.; Chan, J. Y. H.; Bateman, A. C.; Arends, M. J.; Soilleux, E. J.

2026-06-05 pathology 10.64898/2026.06.04.26354889 medRxiv
Top 0.9%
0.8%
Show abstract

Background Coeliac disease (CD) diagnosis on duodenal biopsies is limited by interobserver variability. We have previously demonstrated pathologist-level performance with our artificial intelligence (AI) model for the histopathological diagnosis of adult CD, but not in paediatric practice. As paediatric CD screening programmes expand internationally, accurate and scalable diagnostic tools are needed. We investigated whether an AI model trained exclusively on adult whole-slide images (WSIs) can generalise to paediatric CD diagnosis across independent centres. Methods A training and validation dataset of 9,958 WSIs from 8,421 adult patients (961 CD) from five centres was used to develop an ensemble of multiple-instance learning models using features from a foundation model. Testing was performed on 708 consecutive paediatric patients (86 CD) from two centres (Edinburgh and Southampton) not included in training. Model calibration was assessed, and probability outputs were grouped into clinically interpretable categories. Findings In adult cross-validation, the AI model achieved an area under the receiver operating characteristic curve (AUC) of 98.7%, sensitivity of 84.9%, specificity of 99.0%, and negative predictive value (NPV) of 98.1%. On testing (paediatric) datasets, performance remained high (AUC 98.8%, sensitivity 80.2%, specificity 98.4%, NPV 97.3%). Restricting analysis to predictions outside the intermediate-probability range (predicted CD probability <10% or [&ge;]65%; 85.3% of cases) improved sensitivity to 100% and specificity to 98.7%. No misclassifications were observed among high-confidence predictions (<2% or [&ge;]85%; 66.0% of cases). The expected calibration error was 0.03. Performance improved significantly when biopsies from both duodenal sites (bulb [D1] and descending [D2/3]) were considered. Interpretation Our AI model, trained on adult biopsies, generalises to paediatric CD diagnosis across centres and scanner platforms. Well-calibrated probability outputs provide clinically interpretable measures of diagnostic confidence and could support safe identification of CD-negative biopsies within defined thresholds. These findings demonstrate the feasibility of applying adult-derived AI models in paediatric populations and reinforce the importance of multi-site (D1 & D2) biopsy sampling.

18
Liver biopsy confirms precise and efficient correction of SERPINA1 after in vivo Base Editing in a Patient with Alpha-1 Antitrypsin Deficiency

Krooss, S. A.; Yang, T.; Yuan, Q.; Drick, N.; Sgodda, M.; Held, J.; Behrendt, P.; Hartleben, B.; Koczulla, R.; Ma, X.; Liu, Y.; Wedemeyer, H.; Janciauskiene, S.; Di Donato, N.; Cantz, T.; Wang, E.; Wu, Y.; Hoeper, M.; Xia, Q.; Ott, M.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.01.26354551 medRxiv
Top 1.0%
0.8%
Show abstract

Background: Alpha-1 antitrypsin deficiency (AATD) caused by the PI*ZZ mutation (Glu342Lys) results in hepatic accumulation of misfolded AAT-Z protein and reduced circulating AAT levels, leading to progressive liver disease and emphysema. Gene correction therapy represents a potentially curative approach by directly correcting the underlying genetic defect. We report the first case of successful hepatic gene correction with early histological and functional assessment. Methods/Case presentation: We report the case of a 66-year-old male patient with PI*ZZ AATD who underwent gene correction therapy within the YOLT-202 phase I/Ia clinical trial (clinical trial.gov ID NCT07193615). Ten weeks post treatment a liver biopsy was performed to re-evaluate pre-existing F2 liver fibrosis as measured by elastography before entering the study. Serum samples allowed functional assessment of the AAT-mediated elastase inhibition. Results: Liver biopsy did not show signs of hepatic inflammation and demonstrated 54% (Sanger) and 57% (Illumina) gene correction rate of the PI*ZZ variant on the DNA level with no bystander edits or off-target effects. Following a transient elevation of transaminases during the early post-treatment period, liver enzymes normalized. Monthly serum AAT measurements demonstrated biologically active and stable therapeutic levels throughout follow-up. Conclusions: This case demonstrates efficient and precise hepatic gene correction without concerning histological alterations and with substantial improvement of functional parameters, supporting the feasibility and safety of gene editing approaches for AATD.

19
Parental educational attainment polygenic scores contribute to phenotypic heterogeneity in offspring with autism

Gao, S.; Sui, Y.; Tian, P.; Rao, X.; Yan, C.; Xu, Y.; Wang, T.

2026-06-08 genetic and genomic medicine 10.64898/2026.06.03.26354779 medRxiv
Top 1%
0.7%
Show abstract

Educational attainment-related polygenic scores have been implicated in autism spectrum disorder (ASD), but how parental polygenic scores shape offspring phenotypes remains unclear. Using genotyping and exome-sequencing data from 142,357 individuals (55,252 ASD cases) in a large ASD cohort, we dissected the direct and indirect genetic effects of educational attainment-related polygenic scores on ASD phenotypes. Trio-model analyses showed that parental polygenic scores for educational attainment (PGSEA ) were associated with milder core ASD symptoms, including social deficits and repetitive behaviors, predominantly through indirect genetic effects, whereas their associations with comorbidities were driven predominantly by direct genetic effects. PGSEA was also significantly negatively associated with rare variant burden and prenatal factors, although these factors contributed largely independently to most phenotypes. Adjustment for full-scale intelligence quotient (FSIQ) and socioeconomic status (SES) partially attenuated the indirect effects of PGSEA on offspring phenotypes. Finally, higher parental PGSEA was associated with later age at diagnosis in offspring, partly through its protective effects on ASD phenotypes. These findings indicate that indirect genetic effects of parentalPGSEA contribute substantially to phenotypic variation in ASD and highlight family-mediated pathways as an important component of ASD heterogeneity.

20
Trans-ancestry genome-wide association meta-analysis of antidepressant response to selective serotonin reuptake inhibitors in clinical studies of depression

Hu, K.; Lo, C. W. H.; Awasthi, S.; Pain, O.; Singh, M.; Ahn, Y.; Aitchison, K. J.; Baune, B. T.; Biernacka, J. M.; Bondolfi, G.; Carrillo-Roa, T.; Choi, H.; Czamara, D.; Domschke, K.; Fabbri, C.; Hamilton, S. P.; Ising, M.; Jang, Y.; Kato, M.; Kim, D. K.; Kim, D.; Lee, B.-C.; Lewis, G.; Lim, S.-W.; Liu, Y.-L.; Myung, W.; Perroud, N.; Serretti, A.; Tsai, S.-J.; Uher, R.; Weinshilboum, R.; Won, H.-H.; Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, ; Ripke, S.; Coleman, J.; Lewis, C. M.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354703 medRxiv
Top 1%
0.7%
Show abstract

Antidepressants are widely prescribed for major depressive disorder, yet only one-third of patients achieve remission after initial treatment. Previous genome-wide association studies (GWAS) of clinically assessed antidepressant response combined multiple antidepressant classes, potentially obscuring class-specific effects. This study focused on selective serotonin reuptake inhibitors (SSRIs), often first-line due to better tolerability. Data from 15 cohorts across four ancestries were integrated: European (N = 3887; 11 studies), East Asian (N = 1068; 4), African (N = 277; 1), and Admixed American (N = 250; 1). GWAS of non-remission and percentage improvement were conducted within cohorts, followed by ancestry-specific meta-analyses and trans-ancestry meta-regression. Single nucleotide polymorphism (SNP)-based heritability was estimated in European samples. Polygenic scores were used for leave-one-out prediction and to assess shared genetic architecture with psychiatric traits. Gene-level and gene-set enrichment analyses were also performed. No genome-wide significant variants were identified for either outcome in any ancestry-specific or trans-ancestry analyses. However, trans-ancestry meta-regression yielded eight independent loci with suggestive associations (p < 1 x 10-5) for non-remission and 17 for percentage improvement. Gene-set analyses revealed nominal enrichment of the serotonergic synapse pathway for non-remission. SNP-based heritability estimates were not significantly different from zero for either outcome. Better SSRI response was nominally associated with lower genetic predisposition to major depressive disorder, post-traumatic stress disorder, and schizophrenia. This study represents the largest trans-ancestry GWAS of SSRI response, highlighting emerging biological signals. Limited power emphasises the need for larger and ancestrally diverse cohorts to better characterise the genetic architecture of antidepressant response.